Discrimination-aware Network Pruning for Deep Model Compression
نویسندگان
چکیده
منابع مشابه
Automated Pruning for Deep Neural Network Compression
In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be performed during the backpropagation phase of the network training. This enables an end-to-end learning and strongly reduces the training time. The technique is ...
متن کاملA Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, We introduce a three stage pipeline: pruning, quantization and Huffman encoding, that work together to reduce the storage requirement of neural networks by 35× to 49× without affecting their accuracy. Our method...
متن کاملDeep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
Deep Compression is a three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10x, quantization further improves the compression rate between 27x and 31x. Huffman coding gives more compression: between 35x and 49x. The compression rate already included the metadata for sparse representation. Deep Compression doesn’t incur loss of accu...
متن کاملCompression-aware Training of Deep Networks
In recent years, great progress has been made in a variety of application domains thanks to the development of increasingly deeper neural networks. Unfortunately, the huge number of units of these networks makes them expensive both computationally and memory-wise. To overcome this, exploiting the fact that deep networks are over-parametrized, several compression strategies have been proposed. T...
متن کاملUniversal Deep Neural Network Compression
Compression of deep neural networks (DNNs) for memoryand computation-efficient compact feature representations becomes a critical problem particularly for deployment of DNNs on resource-limited platforms. In this paper, we investigate lossy compression of DNNs by weight quantization and lossless source coding for memory-efficient inference. Whereas the previous work addressed non-universal scal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Pattern Analysis and Machine Intelligence
سال: 2021
ISSN: 0162-8828,2160-9292,1939-3539
DOI: 10.1109/tpami.2021.3066410